Performance Limits Due to Inter-Cluster Data Forwarding in Wire-Limited ILP Microprocessors

نویسندگان

Lucian Codrescu

James Meindl

Scott Wills

چکیده

The growing speed gap between transistors and wire interconnects is forcing the development of distributed, or clustered, architectures. These designs partition the chip into small regions with fast intra-cluster communication. Longer latency is required to communicate between clusters. The hardware and/or software is responsible for scheduling instructions to clusters such that critical path communication occurs within a cluster. This paper explores fundamental interactions between semiconductor technology and clustered architectures. The relationship between key technology parameters (inter-cluster wire delay and transistor switching delay) and key architecture parameters (superscalar vs multithreaded instruction dispatch, and value prediction support) is investigated. The GENESYS modeling tool is used to predict inter-cluster latencies as VLSI technology advances. The study shows that performance limits of the conventional superscalar approach are substantially higher with zero-delay wires. As wire delay increases, performance of these designs degrade quickly. Threaded designs are more tolerant to wire delay. It is seen that the optimal thread size changes with advancing VLSI technology, suggesting a highly adaptive architecture. Value prediction is shown to be useful in all cases, but provides more benefit to the multi-threaded design.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A wire delay-tolerant reconfigurable unit for a clustered programmable-reconfigurable processor

Wire delay is rapidly becoming a major bottleneck in reconfigurable systems, creating a significant gap between the clock rates of reconfigurable logic and custom circuits. In this paper, we describe the design of the reconfigurable clusters on the Amalgam clustered programmable-reconfigurable processor. Amalgam’s reconfigurable clusters are divided into four segments of reconfigurable logic, l...

متن کامل

The Impact of Exploiting Instruction-Level Parallelism on Shared-Memory Multiprocessors

ÐCurrent microprocessors incorporate techniques to aggressively exploit instruction-level parallelism (ILP). This paper evaluates the impact of such processors on the performance of shared-memory multiprocessors, both without and with the latencyhiding optimization of software prefetching. Our results show that, while ILP techniques substantially reduce CPU time in multiprocessors, they are les...

متن کامل

Architectural support for thread communications in multi-core processors

In the ongoing quest for greater computational power, efficiently exploiting parallelism is of paramount importance. Architectural trends have shifted from improving singlethreaded application performance, often achieved through instruction level parallelism (ILP), to improving multithreaded application performance by supporting thread level parallelism (TLP). Thus, multi-core processors incorp...

متن کامل

Energy-Aware Probabilistic Epidemic Forwarding Method in Heterogeneous Delay Tolerant Networks

Due to the increasing use of wireless communications, infrastructure-less networks such as Delay Tolerant Networks (DTNs) should be highly considered. DTN is most suitable where there is an intermittent connection between communicating nodes such as wireless mobile ad hoc network nodes. In general, a message sending node in DTN copies the message and transmits it to nodes which it encounters. A...

متن کامل

The Performance Impact of Exploiting Branch ILP with Tree Representation of ILP Code

Modern single-CPU microprocessors exploit instruction-level parallelism (ILP) by deriving their performance advantage mainly from parallel execution of ALU and memory instructions within a single clock cycle. This performance advantage obtained by exploiting data ILP is severely offset by sequential execution of conditional branches, especially in branch-intensive non-numerical code. Consequent...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2000

Performance Limits Due to Inter-Cluster Data Forwarding in Wire-Limited ILP Microprocessors

نویسندگان

چکیده

منابع مشابه

A wire delay-tolerant reconfigurable unit for a clustered programmable-reconfigurable processor

The Impact of Exploiting Instruction-Level Parallelism on Shared-Memory Multiprocessors

Architectural support for thread communications in multi-core processors

Energy-Aware Probabilistic Epidemic Forwarding Method in Heterogeneous Delay Tolerant Networks

The Performance Impact of Exploiting Branch ILP with Tree Representation of ILP Code

عنوان ژورنال:

اشتراک گذاری